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Dear  Dr.  Davis: 


This  is  written  to  provide  a  semi-annual  progress  report  for  the  contract  N00014-91-J-1316 
entitled  “Theoretical  and  Experimental  Research  into  Biological  Mechanisms  Underlying 
Learning  and  Memory.”  The  major  goal  of  our  research  is  to  elucidate  the  biological  mecha¬ 
nisms  that  underlie  learning  and  memory:  to  find  principles  of  organization  that  can  account 
both  for  experimental  data  on  the  cellular  level  and,  when  applied  to  large  numbers  of  neu¬ 
rons  that  receive  sensory  and/or  inter  neuronal  information,  for  various  higher  level  system 
properties. 

Among  our  detailed  objectives  are  the  following:  to  clarify  the  dependence  of  learning  on 
synaptic  modification,  to  elucidate  the  principles  that  govern  synapse  formation  or  modifi¬ 
cation,  to  use  principles  of  organization  that  can  account  for  observations  on  a  cellular  level 
to  construct  neural-like  systems  that  can  learn,  associate  and  reproduce  such  higher  level 
cognitive  acts  as  abstraction  and  computation. 

The  approaches  employed  to  achieve  these  objectives  include  both  theory  and  experiment. 
Theoretical  and  experimental  consequences  of  the  hypothesis  that  synapse  modification  is 
dependent  on  local  information  (in  visual  cortex,  dominated  by  the  inputs  from  the  eyes 
with  specific  visual  information)  in  accordance  with  theoretical  ideas  we  have  developed,  as 
well  as  by  global  instructions  affecting  large  numbers  of  synapses  and  coming  perhaps  from 
modulatory  transmitters  such  as  norepinephrine,  have  been  tested.  In  addition,  various 
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principles  that  appear  to  be  operating  on  the  cellular  level  have  been  used  to  construct 
models  of  higher  level  functions. 

One  of  our  key  objectives  is  to  produce  real  interaction  between  theory  and  experiment. 
The  means  for  achieving  this  has  been  a  continuing  dialogue  between  experimentalists  and 
theoreticians  that  has  produced  a  genuine  collegial  relationship  in  which  experts  in  very 
different  disciplines  can  understand  each  other’s  language. 

1  Simulations  using  natural  inputs 

A  key  simplification  used  up  to  now  in  simulations  and  analysis  of  the  evolution  of  BCM 
neurons  has  been  the  visual  environment.  In  the  past  contract  period  we  have  begun  and 
investigation  of  the  validity  of  this  rearing  environment  model  used  in  the  CBC  simulations 
of  visual  deprivation  experiments  was  tested  by  using  a  more  realistic  model  of  visual  experi¬ 
ence.  Natural  images  preprocessed  by  a  retinal  filter  were  used  to  generate  input  to  a  single 
cell  model  of  synaptic  plasticity  in  visual  cortex.  The  simulations  of  normal  rearing,  monoc¬ 
ular  deprivation  and  reverse  suture  using  these  realistic  inputs  produced  similar  results  as 
the  CBC  simulations  which  used  abstract  one  dimensional  inputs. 

These  simulations  used  a  model  of  the  kitten  visual  system  from  the  retina  to  primary 
visual  cortex.  A  single  neuron  represented  the  cortex,  and  the  BCM  theory  was  used  to 
model  its  synaptic  plasticity.  Circular  regions  from  the  left  and  the  right  retinas  covering 
the  same  visual  space,  were  used  to  generate  input  to  the  single  BCM  neuron.  The  lateral 
geniculate  nucleus  (LGN)  was  assumed  to  simply  relay  the  signal  generated  by  the  retina  to 
the  visual  cortex. 

Each  retina  included  an  array  of  ganglion  cells  spaced  one  unit  apart,  and  an  array  of 
receptors  which  were  also  spaced  one  unit  apart.  Only  ganglion  cells,  whose  receptive-field 
midpoints  fell  within  a  circular  visual  area  with  a  radius  of  five  units  were  included  in 
the  model.  Each  ganglion  cell  had  an  antagonistic  center-surround  receptive  field  which 
approximated  a  difference  of  two  Gaussians.  The  standard  deviation  of  the  center  Gaussian 
was  1  unit,  and  the  standard  deviation  of  the  surround  Gaussian  was  3  units.  This  created 
a  receptive  field  center  with  a  radius  of  2.22  units.  The  receptive  field  of  each  ganglion  cell 
was  balanced  so  that  uniform  illumination  of  any  intensity  resulted  in  spontaneous  activity. 

The  visual  environment  of  the  model  consisted  of  eight  gray  scale  images  with  dimensions 
150X150  pixels.  For  each  cycle  of  the  simulation,  the  activity  of  the  receptors  in  the  retina 
was  determined  by  randomly  picking  one  of  the  eight  images,  and  randomly  shifting  the  image 
on  the  models  retina.  The  shift  was  restricted  so  that  none  of  the  ganglion  cell  receptive 
field  centers  fell  within  five  units  of  the  image  border.  The  activity  of  each  receptor  in  the 
model  was  determined  by  the  intensity  of  a  pixel  in  the  image.  This  method  generated  a 
very  large  training  set  because  of  the  many  unique  shifts  which  were  possible.  The  maximum 
ganglion  cell  activity  generated  by  the  patterned  input  was  1.57,  and  the  ganglion  cell  activity 
generated  by  a  sutured  eye  was  simply  noise  uniformly  distributed  in  the  interval  [0.0, 0.8). 

A  selected  times  during  the  simulations,  spots  of  light  were  used  to  characterize  the  recep¬ 
tive  field  of  the  BCM  neuron  through  the  left  and  right  eyes.  Two  dimensional  maps  of  the 
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Figure  1:  Natural  images  used  for  training  the  model. 

receptive  field  were  generated  by  “shining”  small  spots  of  light  at  many  location  on  a  retina 
and  recording  the  BCM  activity  generated  for  each  spot.  This  is  similar  to  the  process  used 
by  Palmer  and  Jones  to  generate  two-dimensional  receptive  field  profiles  of  simple  cells  in 
cat  striate  cortex  (Jones  and  Palmer,  1987).  The  maximums  of  the  left  eye  map  and  the 
right  eye  map  were  used  to  determine  the  binocularity  of  the  BCM  neuron. 

figures  3,  5  and  7  show  the  results  from  simulations  of  normal  rearing,  monocular 

deprivation  and  reverse  suture.  They  can  be  compared  to  the  results  of  the  CBC  simulations 
shown  in  figures  2,  4  and  6.  The  scale  of  the  horizontal  axis  in  these  two  sets  of  figures 
is  different  because  the  simulations  using  the  natural  input  required  many  more  training 
iterations  for  the  BCM  neuron  to  become  selective.  This  can  be  accounted  for  by  the 
additional  complexity  introduced  by  the  realistic  input.  As  in  the  CBC  simulations,  normal 
rearing  produced  a  binocular  neuron  which  was  equally  driven  through  the  left  and  the 
right  eyes.  The  two-dimensional  maps  of  t he  BCM  neurons  receptive  field  show  how  it  also 
develops  selectivity  to  the  orientation  of  a  stimulus. 

In  both  simulations  of  monocular  deprivation,  the  sutured  eye  disconnects  from  the  BCM 
neuron,  and  in  both  simulations  of  re  verse  suture  the  newly  closed  eye  disconnects  from  the 
BCM  neuron  before  the  newly  open  eye  reconnects.  These  results  suggest  that  the  original 
abstract  patterns  distorted  by  noise  were  an  adequate  model  of  visual  experience  for  the 
simulations  of  these  visual  deprivation  experiments. 


2  Localized  Principal  Components  of  Natural  Images  -  an  Ana¬ 
lytic  Solution 

It  has  been  proven  that  a  neuron  with  Hebbian  learning  rule  plus  a  proper  decay  term 
can  perform  a  principal  component  extraction.  Furthermore,  a  neural  network  with  proper 
lateral  inhibition  can  perform  the  extraction  of  several  principal  components  simultaneously. 
The  computational  importance  of  principal  components  is  that  they  are  the  optimal  linear 
projections  for  minimizing  the  mean  squared  reconstruction  error. 

Since  the  principal  components  of  a  set  of  inputs  depend  only  on  their  covariance  matrix, 
it  is  reasonable  that  given  this  matrix,  they  can  be  calculated  analytically. 

We  believe  that  it  is  reasonable  to  model  postnatel  development  with  an  environment 
composed  of  natural  scenes.  The  nature  of  the  covariance  matrix  of  natural  images  was 
investigated  by  Field,  who  found  that  the  spectrum  of  covariance  matrix  is  proportional  to 
the  inverse  of  the  square  of  the  frequency. 

We  assume  a  circular  hard  boundary  to  the  receptive  fields,  with  a  radius  equal  to  the  zero 
crossing  of  the  correlation  function. 

We  find  that  the  solutions  are  the  Fourier-Bessel  functions.  We  will  show  in  section  3, 
that  under  the  assumption  that  the  covariance  matrix  spectrum  has  a  small  non-rotationally 
symmetric  correction,  the  solutions  have  a  definite  phase. 

2.1  The  Rotationally  Symmetric  Solution 

The  principal  components  are  the  eigen-functions  of  the  covariance  matrix.  Therefore  the 
equation  we  try  to  solve  is  the  eigenvalue  problem,  i.e.,  the  eigen-equation,  which  has  the 
form 

Y.c„w,  =  \w.  (i) 
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where  Wt  are  eigen-vectors,  A  is  the  eigenvalue,  and  CtJ  is  the  covariance  matrix  which  is 
defined  as  Ctj  =  E[(Ii  —  £[/,])(/j  —  .£[/;])]  for  input  pattern  {/,}.  Since  we  are  dealing  with 
two  dimensional  space,  the  index  i  really  denotes  a  point  in  the  two  dimensional  space,  so  it  is 
more  convenient  to  rewrite  the  covariance  matrix  in  the  form  C(ri,  r' ).  Due  to  translational 
invariance,  C(r,,r')  =  C(rt  —  r^).  In  the  continuous  limit,  the  summation  will  become  an 
integral  over  r',  thus  the  eigen-equation  becomes 

J  C( r  -  r')V'(r')d2r'  =  Ai/>(r).  (2) 

in  which  tu(r)  is  the  continuous  limit  of  the  eigen-vectors  Wt. 

The  Fourier  transform  (spectrum)  of  the  covariance  matrix  has  the  form,  C(k)  =  c/k2 
where  c  is  a  constant.  Hereafter  we  will  set  c  =  1  for  convenience.  Thus  C(r)  satisfies 

V2C(r  —  r')  =  -6(r  -  r').  (3) 

which  can  be  readily  proven  by  taking  Fourier  transformation  on  both  side  of  this  equation. 
Since  the  correlation  function  is  zero  on  some  boundery  assumed  to  be  a  circuler  boundery 
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of  radius  a,  then  within  this  boundery  it  can  be  represented  as  a  sum  of  a  complete  set  of 
functions  with  the  same  boundery  condition.  We  will  choose  the  Bessel  Fourier  set  Wmi 
which  is  zero  on  the  boundery,  which  take  the  form 


for  r  <  a 
for  r  >  a 


(4) 


in  which  m  =  0,  1,  2,...,  Jm(x)  is  the  standard  Bessel  functions,  Xmt  is  the  ith  root  of 
equation  Jm(a/VX)  —  0  ,  r  and  8  are  the  polar  coordinates  of  r.  These  functions  solve  the 
differential  equation, 

V2W,m  =  —(1/A  (5) 

In  this  representation  the  correlation  function  must  take  the  form 


C(r-r')  =  i:;'-»^(r)»'.4r')  (6) 

tm 

Since,  remembering  that  6(r  -  r')  =  Wfc*j(r)Wfc/(r'),  it  is  easy  to  see  that  C{ r  -  r')  is  a 
solution  of  eq:3.  It  is  important  to  notice  that  this  solution  to  C(r  —  r')  is  not  unique,  since 
we  can  add  a  constant  to  this  and  still  retain  a  radially  symmetric  equation.  This  is  avoided 
by  choosing  the  boundery  a  such  that  this  constant  is  0,  which  implies  that  the  hard  wired 
connections  between  retina  and  neurons  must  have  a  spatial  extent  which  is  equal  to  the 
zero  crossing  of  the  correlation  function. 

Thus  plugging  the  correlation  function  of  eq:6  into  the  eigen  equation,  representing  the 
eigenfunctions  as  well  as  sums  of  this  complete  set,  xp(r)  =  £jn  BJnWjn( r),  and  using  the 
orthogonality  of  these  functions  over  the  interval,  we  obtain  that 


£  *uBuWu( r)  =  A  £  BklWkl( r). 

kl  kl 

For  which  the  solution,  is  that  only  one  of  the  coefficients  BXJ  =  1  and  the  rest  are  zero.  The 
corresponding  eigenvalue  is  A  =  AtJ.  Thus  the  solutions  are  the  Besse|  Fourier  functions. 


w 


Ur) 


Jm(-7=—)c°s{rn0  +  <t>mx)  for  r  <  a 
0  for  r  >  a 


)sin(m8  +  for  r<a 

v  ''mi 

0  for  r  >  a 


(7) 


where  <f>mt  is  a  set  of  undetermined  phases.  These  two  eigen-functions  have  the  same  eigen¬ 
value  Ami,  i.e.,  they  are  degenerate. 

If  we  order  the  solutions  by  the  magnitudes  of  the  correspondent  eigenvalues  Amt,  the  first 
ten  solutions,  w^^r)  with  =  0  and  a  =  1,  are  drawn  in  figure  8. 
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2.2  Retrieving  the  Phase 

The  solutions  above  u;^(r)  and  u;^(r)  not  only  have  undetermined  phases,  but  also  are 
degenerate.  This  contradicts  the  results  of  the  simulationspreformed  by  Hancock  in  which 
the  phases  seem  to  always  take  the  value  zero,  and  the  W ^  solution  has  a  different  eigenvalue 
from  the  W2-  solution.  These  results  can  be  retrieved  if  we  assume  that  the  covariance  matrix 

nu 

has  a  non-rotationally  symmetric  perturbation  term.  This  assumption  is  not  arbitrary  since 
an  inspection  of  Fields  results  reveals  that  this  is  indeed  the  case.  Hereafter  we  assume  this 
perturbation  term  has,  in  k  space,  the  form 


c\  k)  -  u(k)T(ek). 


(8) 


In  order  to  calculate  this  perturbation,  the  representation  of  this  perturbation  in  the  two 
degenerate  eigen-functions  W^r)  and  W£j(r)  has  to  be  calculated.  It  is  easier  to  perform 
this  in  k  space  in  which  the  eigen-functions  W^r)  and  W^r)  are  replaced  by  their  Fourier 
transforms, 


Wm»(k)  =  /m,(fc)cos(m0k  +  4>mt) 


Wm,(k)  =  fmt(k)sin(mek  +  <£mt ) 


(9) 


in  which 


=  ttjT 


)Jm(kr)rdr 


where  j2  =  —  1.  If  we  denote 


(10) 


=  k  -  «*))  (u) 

i 


which  is  the  Fourier  expansion  of  T(6k).  The  representation  of  the  perturbation  matrix  with 
respect  to  the  two  degenerate  eigen-functions  has  the  form 

(CU.  ih-».i))<-wh=u)  =  ( /  H':.(k)-C'(k)HCi(k)<iIk)(„=1,2W=u) 

-  „  (  cos(6)  sin(6)  \  M91 

“  9mi  \  sin{6)  - cos{6 )  ) 

in  which  6  =  2^rat  +  2ma2m  and  gmi  =  J  U(k)\fmi(k)\2kdk.  Since  the  two  eigen-functions 
are  degenerate,  any  linear  combination  of  these  two  eigen-functions  is  an  eigen-function  of  C. 
Therefore,  all  we  have  to  do  is  to  find  a  linear  combination  of  them  which  diagonalizes  the 
perturbation  matrix,  i.e.,  to  find  the  eigenvalues  and  eigen-vectors  of  the  matrix  in  equation 
12,  which  are 

(  cos(6/ 2)  \ 

\  sin(6/ 2)  J 

(13) 

(  - sin(6/2 )  \ 
y  cos(8/2)  J 
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with  eigenvalues  gmi  and  —  gmi  respectively.  Furthermore,  if  U{k)  =  e/k2  then  the  complete 
expression  for  the  correction  to  the  eigenvalue  takes  the  form  gm,  =  eAmjt2m/2. 

Thus  the  eigen-functions  and  eigenvalues  after  the  perturbation  can  be  readily  written  out 
as 

W+i(k)  =  Jm(~^)cos{m{6  -  a2m)) 

V  ^  mi 

(14) 

Wm,(k)  =  Jm(-Z—)sm(m(0  -  a2m)) 

V  ^mt 

with  eigenvalues  A+{  =  A mi  +  gmi  ,  and  =  Am^  —  gmi  ,  respectively.  So  the  degeneracy  is 
broken.  This  is  in  agreement  with  Hancock’s  simulations.  These  solutions  have  an  important 
feature,  i.e.,  their  phases  are  determined  by  the  properties  of  the  real  world  covariance 
matrix.  If  the  covariance  matrix  has  a  definite  symmetry  with  an  inclination  angle  a,  then 
the  solutions  would  also  have  the  same  symmetry  angle.  Because  in  this  case  a2m  =  a  for 
all  m.  The  spectrums  of  the  covariance  matrix,  shown  in  figure  7  of  Field’s  paper,  indeed 
indicates  a  symmetry  axis  along  a  —  0.  Thus  equation  14  predicts  the  zero  phase  result 
found  in  Hancock’s  simulation.  When  Hancock  used  images  which  were  tilted  by  45  degrees 
before  being  scanned,  the  preferred  axis  of  the  receptive  fields  was  found  to  be  45  degrees. 
Again  this  is  predicted  by  equation  14,  because  the  symmetry  axis  of  the  covariance  matrix 
spectrum  also  gets  rotated  by  45  degrees  due  to  the  rotated  images,  i.e.,  a  =  45°,  and  thus 
the  solutions  also  get  rotated  by  45  degrees. 

2.3  Discussion 

We  have  calculated  the  forms  of  the  principal  components  of  natural  images  based  on  the 
result  about  the  covariance  matrix,  and  have  shown  that  a  non-rotationally  symmetric  per¬ 
turbation  can  break  the  degeneracy  and  give  a  definite  phase  which  only  depends  on  the 
properties  of  the  real  world  covariance  matrix.  These  results  for  a  large  part  agree  with  the 
numerical  simulation. 

The  neurobiological  relevance  of  the  type  of  technique  used  in  this  paper  is  that  we  can 
deduce  for  different  learning  rules  what  kinds  of  receptive  fields  they  should  produce.  Given 
these  receptive  fields,  we  can  compare  them  to  the  real  biological  receptive  fields.  This 
comparison  can  be  used  to  assess  whether  the  biological  hardware  really  implements  or 
approximates  a  theoretically  proposed  learning  rule. 

The  most  obvious  conclusion  which  stands  out  when  we  observe  the  results  in  figure  8,  is 
that  these  receptive  fields  have  little  resemblance  to  receptive  fields  reported  in  the  biological 
literature.  Does  this  imply  that  biological  neurons  are  not  principal  component  analizers? 
When  addressing  this  question  we  have  to  keep  in  mind  that  the  natural  images  projected 
on  the  retina,  undergo  preprocessing  in  the  retina  and  LGN,  before  they  reach  the  visual 
cortex.  Similar  preprocessing  should  therefore  be  applied  to  natural  images  in  simulations 
and  analytic  studies,  before  a  sensible  answer  can  be  given. 
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3  Hybrid  Network  Techniques 

We  have  previously  shown  that  hybrid  network  techniques  can  significantly  improve  net¬ 
work  performance  on  difficult  real-world  problems.  Below,  we  develop  a  firm  mathematical 
framework  for  the  observed  network  performance  improvement. 

3.1  Basic  Ensemble  Method 
Consider  the  following  regression  problem 

V  =  /(*)  +  n 

where  y  is  a  random  variable  with  mean  f(x)  =  E[y\x]  and  n  is  independent  zero-mean 
noise.1  We  present  the  Basic  Ensemble  Method  (BEM)  which  combines  a  population  of 
regression  estimates,  fi(x),  to  estimate  a  function  f(x). 

Suppose  that  we  have  two  finite  data  sets  whose  elements  are  all  independent  and  identically 
distributed  random  variables:  a  training  data  set  A  =  {(im,  ym)}  and  a  cross- validatory  data 
set  CV  =  {(x/,yi)}.  Further  suppose  that  we  have  used  A  to  generate  a  set  of  functions, 
T  =  fi(x),  each  element  of  which  approximates  /(x).2  We  would  like  to  find  the  best 
approximation  to  /(x)  using  T . 

One  common  choice  is  to  use  the  naive  estimator,  /Naive  (z),  which  minimizes  the  empirical 
mean  square  error  relative  to  /(x),3 

MSE[/(]  =  Ecv[(y,  -  *(*,))>], 


thus 

/Naive  (x)  =  argmin{MSE[/,]}. 

t 

This  choice  is  unsatisfactory  for  two  reasons:  First,  in  selecting  only  one  regression  estimate 
from  the  population  of  regression  estimates  represented  by  T ,  we  are  discarding  potentially 
useful  information  that  is  stored  in  the  discarded  regression  estimates;  second,  since  the 
CV  data  set  is  random,  there  is  a  certain  probability  that  some  other  network  from  the 
population  will  perform  better  than  the  naive  estimate  on  some  other  previously  unseen 
data  set  sampled  from  the  same  distribution.  A  more  reliable  estimate  of  the  performance 
on  previously  unseen  data  is  the  average  of  the  performances  over  the  population  T .  Below, 
we  will  see  how  we  can  avoid  both  of  these  problems  by  using  the  BEM  estimator,  /bem(®)> 
and  thereby  generate  an  improved  regression  estimate. 

*The  noise  for  minimizing  the  MSE  is  assumed  to  be  Gaussian;  but  this  assumption  is  not  necessary  for 
what  follows. 

2 For  our  purposes,  it  does  not  matter  how  T  was  generated,  unlike  Monte  Carlo.  In  practice  we  will 
use  a  set  of  back  propagation  networks  trained  on  the  A  data  set  but  started  with  different  random  weight 
configurations.  This  replication  procedure  is  standard  practice  when  trying  to  optimize  neural  networks. 

3Here,  and  in  all  of  that  follows,  the  expected  value  is  taken  over  the  cross- validatory  set  CV. 
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Define  the  misfit  of  function  fi( x),  the  deviation  from  the  true  solution,  as  m;(x)  =  /(x)  — 
fi(x).  The  empirical  mean  square  error  can  now  be  written  in  terms  of  m,(x)  as 

MSE[/t]  =  E[m]\. 

The  average  mean  square  error  is  therefore 

mse  =  i  £  £K2]- 

■/v  t=l 

Define  the  BEM  regression  function,  /bem(z)i  as 

/bem(®)  =  Tf'E.Mx)  =  f(x)  -  ]Cm«(x) 

■‘v  i=i  JV  »=i 

If  we  now  assume  that  the  )  are  mutually  independent  with  zero  mean,  we  can  calculate 
the  mean  square  error  of  /bem(^)  as 

MSE[/bem!  =  s[(is>‘)!] 

»=1 

»=1  ^  X*1 

«=1  i*i 

-1  1^1 

which  implies  that 

MSE[/bem]  =  ^MSE.  (16) 

This  is  a  powerful  result  because  it  tells  us  that  by  averaging  regression  estimates,  we  can 
reduce  our  mean  square  error  by  a  factor  of  N  when  compared  to  the  population  performance: 
By  increasing  the  population  size,  we  can  in  principle  make  the  estimation  error  arbitrarily 
small!  In  practice  however,  as  N  gets  large  our  assumptions  on  the  misfits,  m,(x),  eventually 
breakdown.  In  particular,  the  assumption  that  E[mxm}\  =  E\mi\E\mj\  is  no  longer  valid. 

Consider  the  individual  elements  of  the  population  T ■  These  estimators  will  more  or  less 
follow  the  true  regression  function.  If  we  think  of  the  misfits  functions  as  random  noise 
functions  added  to  the  true  regression  function  and  these  noise  functions  are  uncorrelated 
with  zero  mean,  then  the  averaging  of  the  individual  estimates  is  like  averaging  over  the 
noise.  In  this  sense,  the  ensemble  method  is  smoothing  in  functional  space  and  can  be 
thought  of  as  a  regularizer  with  a  smoothness  assumption  on  the  true  regression  function. 

An  additional  benefit  of  the  ensemble  method’s  ability  to  combine  multiple  regression 
estimates  is  that  the  regression  estimates  can  come  from  many  different  sources.  This  fact 


(15) 


allows  for  flexibility  in  the  application  of  the  ensemble  method.  For  example,  the  regression 
estimates  can  have  different  functional  forms;  or  can  be  selected  using  different  optimization 
(i.e.  “training”)  algorithms;  or  can  be  selected  by  optimizing  over  different  data  sets.  This 
last  option  -  optimizing  on  different  data  sets  -  has  important  ramifications.  One  standard 
method  for  avoiding  over-fitting  during  training  is  to  use  a  cross-validatory  hold-out  set. 
The  cross-validatory  hold-out  set  is  a  subset  of  the  total  data  available  to  us  and  is  used 
to  determine  when  to  stop  training.  The  hold-out  data  is  not  used  to  train.  The  problem 
is  that  since  we  use  cross-validation  to  avoid  over-fitting,  each  regression  estimate  is  never 
trained  on  the  hold-out  data  (i.e.  the  cross-validatory  data  set)  and  therefore,  each  regression 
estimate  “sees”  only  part  of  the  data  and  may  be  missing  valuable  information  about  the 
distribution  of  the  data  particularly  if  the  total  data  sc.  is  small.  This  will  always  be  the 
case  for  a  single  regression  estimate  using  a  cross-validatory  stopping  rule.  However,  this  is 
not  a  problem  for  the  ensemble  estimator.  When  constructing  our  population,  T ,  we  can 
train  each  regression  estimate  on  the  entire  training  set  and  let  the  smoothing  property  of 
the  ensemble  process  remove  any  over-fitting  or  we  can  train  each  regression  estimate  in  the 
population  with  a  different  split  of  training  and  hold-out  data.  In  this  way,  the  population 
as  a  whole  will  have  seen  the  entire  data  set  while  each  regression  estimate  has  avoided 
over-fitting  by  using  a  cross-validatory  stopping  rule.  Thus  the  ensemble  estimator  will  see 
the  entnc  data  set  while  the  naive  estimator  will  not.  In  general,  with  this  framework  we  can 
now  easily  extend  the  statistical  jackknife,  bootstrap  and  cross-validation  techniques  (Efron, 
1982;  Miller,  1974;  Stone,  1974)  to  find  better  regression  functions. 

answer  any  questions  you  might  have  concerning  this  report. 
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Figure  2:  CBC  simulation  of  normal  rearing.  The  graph  displays  the  maximum  response  to 
the  training  data  for  the  left  and  right  eyes. 


Figure  3:  Simulation  of  normal  rearing  with  realistic  input.  The  2d  maps  are  the  receptive 
fields  for  the  left  and  right  eyes  at  the  beginning  and  end  of  the  simulation.  The  upper  graph 
shows  the  maximum  of  the  left  and  right  eye  maps  thoughout  the  simulation. 
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Figure  4:  CBC  simulation  of  monocular  deprivation.  The  graph  displays  the  maximum 
response  to  the  training  data  for  the  left  and  right  eyes. 


Figure  5:  Simulation  of  monocular  deprivation  with  realistic  input.  The  2d  maps  are  the 
receptive  fields  for  the  left  and  right  eyes  at  the  beginning  and  end  of  the  simulation.  The 
upper  graph  shows  the  maximum  of  the  left  and  right  eye  maps  thoughout  the  simulation. 
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Figure  6:  CBC  simulation  of  reverse  suture.  The  graph  displays  the  maximum  response  to 
the  training  data  for  the  left  and  right  eyes. 


Figure  7:  Simulation  of  reverse  suture  with  realistic  input.  The  2d  maps  are  the  receptive 
fields  for  the  left  and  right  eyes  at  the  beginning,  middle  and  end  of  the  simulation.  The 
upper  graph  shows  the  maximum  of  the  left  and  right  eye  maps  thoughout  the  simulation. 
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